Methods for capturing spectro-temporal modulations in automatic speech recognition
نویسنده
چکیده
Psychoacoustical and neurophysiological results indicate that spectro-temporal modulations play an important role in sound perception. Speech signals, in particular, exhibit distinct spectro-temporal patterns which are well matched by receptive fields of cortical neurons. In order to improve the performance of automatic speech recognition (ASR) systems a number of different approaches are presented, all of which target at capturing spectro-temporal modulations. By deriving secondary features from the output of a perception model the tuning of neurons towards different envelope fluctuations is modeled. The following types of secondary features are introduced: product of two or more windows (sigma-pi cells) of variable size in the spectro-temporal representation, fuzzy-logical combination of windows and a Gabor function to model the shape of receptive fields of cortical neurons. The different approaches are tested on a simple isolated word recognition task and compared to a standard Hidden Markov Model recognition system. The results show that all types of secondary features are suitable for ASR. Gabor secondary features, in particular, yield a robust performance in additive noise, which is comparable and in some conditions superior to the Aurora 2 reference system.
منابع مشابه
Spectro-temporal Modulations for Robust Speech Emotion Recognition Spectro-temporal Modulations for Robust Speech Emotion Recognition
متن کامل
Spectro-temporal directional derivative features for automatic speech recognition
We introduce a novel spectro-temporal representation of speech by applying directional derivative filters to the Melspectrogram, with the aim of improving the robustness of automatic speech recognition. Previous studies have shown that two-dimensional wavelet functions, when tuned to appropriate spectral scales and temporal rates, are able to accurately capture the acoustic modulations of speec...
متن کاملNeural Responses to Speech-Specific Modulations Derived from a Spectro-Temporal Filter Bank
This paper analyzes the application of methods developed in automatic speech recognition (ASR) to better understand neural activity measured with electrocorticography (ECoG) during the presentation of speech. ECoG data is collected from temporal cortex in two subjects listening to a matrix sentence test. We investigate the relation of ECoG signals and acoustic speech that has been processed wit...
متن کاملSpectro-temporal modulations for robust speech emotion recognition
Speech emotion recognition is mostly considered in clean speech. In this paper, joint spectro-temporal features (RS features) are extracted from an auditory model and are applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is in...
متن کاملSession 2pSCa: Speech Communication 2pSCa2. Improving automatic speech recognition by learning from human errors
This work presents a series of experiments that compare the performance of human speech recognition (HSR) and automatic speech recognition (ASR). The goal of this line of research is to learn from the differences between HSR and ASR, and to use this knowledge to incorporate new signal processing strategies from the human auditory system in automatic classifiers. A database with noisy nonsense u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001